Configure MetaSpore to Access S3

MetaSpore supports S3 as storage to read sample data and write model output.

To configure S3 access, there are two methods as below.

1. Access S3 on AWS EC2

AWS EC2 supports use IAM Role credential and there's no extra config needed.

Note：In AWS China regions，you need to add an env AWS_REGION=cn-north-1，otherwise AWS SDK looks up buckets outside China by default. Export the env befor executing MetaSpore training:

export AWS_REGION=cn-north-1

To run distributed MetaSpore training on Spark, you need to set the env to all executors:

spark_session = SparkSession.builder
                .config('spark.executorEnv.AWS_REGION', 'cn-north-1')
                .getOrCreate()

2. Access S3 Outside AWS EC2

In this case you need to set AWS_ACCESS_KEY_ID、AWS_SECRET_ACCESS_KEY envs. For AWS S3 compatible services, e.g. OSS, OBS, COS, Minio, etc., AWS_ENDPOINT env is also required to be set:

export AWS_ENDPOINT=<end point url>
export AWS_ACCESS_KEY_ID=<your access key id>
export AWS_SECRET_ACCESS_KEY=<your access key>

Set spark.executorEnv.AWS_* for distributed Spark jobs as above mentioned.

For endpoint urls, refer to your cloud service providers' docs:

https://www.alibabacloud.com/help/en/object-storage-service/latest/regions-and-endpoints

https://intl.cloud.tencent.com/document/product/436/6224?lang=en&pg=

https://developer.huaweicloud.com/intl/en-us/endpoint?OBS

Configure MetaSpore to Access S3

1. Access S3 on AWS EC2​

2. Access S3 Outside AWS EC2​

1. Access S3 on AWS EC2

2. Access S3 Outside AWS EC2